AITopics | chizat and bach

Collaborating Authors

chizat and bach

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

0d1a9651497a38d8b1c3871c84528bd4-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-11-2026, 10:56:47 GMT

architecture, kernel, revision, (12 more...)

Neural Information Processing Systems

Genre: Research Report (0.58)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)

Add feedback

We thank the reviewers for their time and constructive feedback on the submission, which we will incorporate to 1 improve our manuscript

Neural Information Processing SystemsOct-2-2025, 01:30:57 GMT

We find that they are positive-definite as expected. Supervised Differentiable Programming" by Chizat and Bach is an important contribution and we will absolutely Sec 2.2 in V1, V2) are restricted to single-hidden-layer networks. It is still an open research question to determine what are the main factors that determine these performance gaps. We will expand discussion around this.

artificial intelligence, machine learning, time and constructive feedback, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.58)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)

Add feedback

Directional convergence and alignment in deep learning

Neural Information Processing SystemsAug-16-2025, 09:39:27 GMT

The above theories, with finite width networks, usually require the weights to stay close to initialization in certain norms. By contrast, practitioners run their optimization methods as long as their computational budget allows [Shallue et al., 2018], and if the data can be perfectly classified, the

assumption, convergence, directional convergence, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Japan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

Reviews: Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Neural Information Processing SystemsJan-21-2025, 12:27:27 GMT

The paper was proofread, well-structured, and very clear. The experiments were clearly described in detail, and provided relevant results. Below we outline some detailed comments of the results. In particular, Chizat and Bach prove that the training of an NTK parameterized network is closely modeled by "lazy training" (their terminology for a linearized model). This paper is not referenced in the related work section.

chizat and bach, modern network, wide neural network, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Regression as Classification: Influence of Task Formulation on Neural Network Features

Stewart, Lawrence, Bach, Francis, Berthet, Quentin, Vert, Jean-Philippe

arXiv.org Artificial IntelligenceMar-1-2023

Neural networks can be trained to solve regression problems by using gradient-based methods to minimize the square loss. However, practitioners often prefer to reformulate regression as a classification problem, observing that training on the cross entropy loss results in better performance. By focusing on two-layer ReLU networks, which can be fully characterized by measures over their feature space, we explore how the implicit bias induced by gradient-based optimization could partly explain the above phenomenon. We provide theoretical evidence that the regression formulation yields a measure whose support can differ greatly from that for classification, in the case of one-dimensional data. Our proposed optimal supports correspond directly to the features learned by the input layer of the network. The different nature of these supports sheds light on possible optimization difficulties the square loss could encounter during training, and we present empirical results illustrating this phenomenon.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Artificial Intelligence

2211.05641

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain (0.04)

Genre: Research Report (1.00)

Industry:

Telecommunications > Networks (0.40)
Information Technology > Networks (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Feature selection with gradient descent on two-layer networks in low-rotation regimes

Telgarsky, Matus

arXiv.org Artificial IntelligenceAug-4-2022

This work establishes low test error of gradient flow (GF) and stochastic gradient descent (SGD) on two-layer ReLU networks with standard initialization, in three regimes where key sets of weights rotate little (either naturally due to GF and SGD, or due to an artificial constraint), and making use of margins as the core analytic technique. The first regime is near initialization, specifically until the weights have moved by $\mathcal{O}(\sqrt m)$, where $m$ denotes the network width, which is in sharp contrast to the $\mathcal{O}(1)$ weight motion allowed by the Neural Tangent Kernel (NTK); here it is shown that GF and SGD only need a network width and number of samples inversely proportional to the NTK margin, and moreover that GF attains at least the NTK margin itself, which suffices to establish escape from bad KKT points of the margin objective, whereas prior work could only establish nondecreasing but arbitrarily small margins. The second regime is the Neural Collapse (NC) setting, where data lies in extremely-well-separated groups, and the sample complexity scales with the number of groups; here the contribution over prior work is an analysis of the entire GF trajectory from initialization. Lastly, if the inner layer weights are constrained to change in norm only and can not rotate, then GF with large widths achieves globally maximal margins, and its sample complexity scales with their inverse; this is in contrast to prior work, which required infinite width and a tricky dual convergence assumption. As purely technical contributions, this work develops a variety of potential functions and other tools which will hopefully aid future work.

assumption 1, probability, theorem 2, (12 more...)

arXiv.org Artificial Intelligence

2208.02789

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.91)

Add feedback

Collaborating Authors

chizat and bach

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

0d1a9651497a38d8b1c3871c84528bd4-AuthorFeedback.pdf

c76e4b2fa54f8506719a5c0dc14c2eb9-Paper.pdf

We thank the reviewers for their time and constructive feedback on the submission, which we will incorporate to 1 improve our manuscript

Directional convergence and alignment in deep learning

Reviews: Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Regression as Classification: Influence of Task Formulation on Neural Network Features

Feature selection with gradient descent on two-layer networks in low-rotation regimes